PPOL 6805 / DSAN 6750: GIS for Spatial Data Science
Fall 2025
Tuesday, September 17, 2024
“Simplest” / Most Efficient Representation
“Indicate which of the seven geometries above would provide the simplest representation of the entity”
GEOMETRYCOLLECTION could represent any of the geographic entities in Q1, but would be overkill for representing e.g. a single point or line
For the United Arab Emirates (UAE) data… we have what we call the Not-Paraguay-Problem: Most countries, including the UAE, have a bunch of lil “pieces”:
So, for Q1.8: Assume we’re just trying to have the computer represent the “mainland” (the biggest contiguous landmass, with Dubai on it!)
From Krygier and Wood (2016)
The Earth’s “width” is slightly greater than its “length” 😰
From Wikimedia Commons
From Monmonier (2018)
Kieran Healy, “America’s Ur-Choropleths”
Kieran Healy, “America’s Ur-Choropleths”
From Reddit
From Wikimedia Commons
From Monmonier (2018)
From Monmonier (2018)
Is poverty a “significant issue” in the US?
From Krygier and Wood (2016)
Is poverty a “significant issue” in the US?
Assigns the same number of observations to each color
From Krygier and Wood (2016)
Is poverty a “significant issue” in the US?
Boundaries between colors come at regular (equal) intervals
From Krygier and Wood (2016)
Is poverty a “significant issue” in the US?
Clustering algorithm chooses classes to (a) minimize differences within classes, (b) maximize differences between classes
From Krygier and Wood (2016)
Is poverty a “significant issue” in the US?
A government program offers special funding for counties with above 25% poverty
From Krygier and Wood (2016)
Is poverty a “significant issue” in the US?
From Krygier and Wood (2016)
Using rnaturalearth with mapview
Computing the union of all geometries in the sf via sf::st_union()
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50)
africa_union_sf <- sf::st_union(africa_sf)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
africa_union_map <- mapview(africa_union_sf, label="st_union(africa)", legend=FALSE)
africa_map | africa_union_mapUse st_union() first:
Computing the centroid of all geometries in the sf via sf::st_centroid()
nc <- system.file("shape/nc.shp", package="sf") |>
read_sf() |>
st_transform('EPSG:2264')
gr <- st_sf(
label = apply(expand.grid(1:10, LETTERS[10:1])[,2:1], 1, paste0, collapse = ""),
geom = st_make_grid(nc))
gr$col <- sf.colors(10, categorical = TRUE, alpha = .3)
# cut, to verify that NA's work out:
gr <- gr[-(1:30),]
suppressWarnings(nc_j <- st_join(nc, gr, largest = TRUE))
par(mfrow = c(2,1), mar = rep(0,4))
plot(st_geometry(nc_j), border = 'grey')
plot(st_geometry(gr), add = TRUE, col = gr$col)
text(st_coordinates(st_centroid(st_geometry(gr))), labels = gr$label, cex = .85)
# the joined dataset:
plot(st_geometry(nc_j), border = 'grey', col = nc_j$col)
text(st_coordinates(st_centroid(st_geometry(nc_j))), labels = nc_j$label, cex = .7)
plot(st_geometry(gr), border = '#88ff88aa', add = TRUE)
st_intersectslengths() [1] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2
| geounit | num_points | geometry |
|---|---|---|
| Algeria | 2 | MULTIPOLYGON (((8.576563 36… |
| Angola | 2 | MULTIPOLYGON (((13.07275 -4… |
| Benin | 0 | MULTIPOLYGON (((1.622656 6…. |
| Botswana | 0 | MULTIPOLYGON (((25.25879 -1… |
| Burkina Faso | 0 | MULTIPOLYGON (((0.9004883 1… |
| Burundi | 0 | MULTIPOLYGON (((30.55361 -2… |
mapviewggplot2Since we’re starting to get into data attributes rather than geometric features, switching to ggplot2 is recommended!
st_buffer()POINT) from Manhattan (POLYGON)?”POLYGON) / stretch of road (LINESTRING)?”POLYGONsKey line: manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34) (1 mile \(\approx\) 1609.34 meters)
library(tidyverse)
library(sf)
library(tidycensus)
library(tigris)
library(mapview)
options(tigris_use_cache = TRUE)
manhattan_sf <- get_acs(
geography = "tract",
variables = "B19013_001",
state = "NY",
county = "New York",
year = 2020,
geometry = TRUE,
cb = FALSE
)
# Erase the island tracts real quick
island_tracts <- c(
"Census Tract 1, New York County, New York",
"Census Tract 2.02, New York County, New York"
)
manhattan_sf <- manhattan_sf |> filter(
!(NAME %in% island_tracts)
)
# Union of all census tracts within the county
manhattan_union_sf <- st_union(manhattan_sf)
manhattan_union_map <- mapview(manhattan_union_sf, label="New York County")
# Construct buffer (1 mile ~= 1609.34 meters)
manhattan_buffer_sf <- manhattan_union_sf |> st_buffer(dist = 1609.34)
manhattan_buffer_map <- mapview(manhattan_buffer_sf, label="Buffer (1 Mile)", col.regions = cbPalette[1], legend = TRUE)
manhattan_buffer_map + manhattan_union_mapLINESTRINGslibrary(tidyverse)
library(sf)
## st_buffer, style options (taken from rgeos gBuffer)
l1 = st_as_sfc("LINESTRING(0 0,1 5,4 5,5 2,8 2,9 4,4 6.5)")
op = par(mfrow=c(2,3))
plot(st_buffer(l1, dist = 1, endCapStyle="ROUND"), reset = FALSE, main = "endCapStyle: ROUND")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="FLAT"), reset = FALSE, main = "endCapStyle: FLAT")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, endCapStyle="SQUARE"), reset = FALSE, main = "endCapStyle: SQUARE")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=1), reset = FALSE, main = "nQuadSegs: 1")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs=2), reset = FALSE, main = "nQuadSegs: 2")
plot(l1,col='blue',add=TRUE)
plot(st_buffer(l1, dist = 1, nQuadSegs= 5), reset = FALSE, main = "nQuadSegs: 5")
plot(l1,col='blue',add=TRUE)From the sf Documentation
st_centroid()
POLYGON \(\mapsto\) POINTMULTIPOLYGON \(\mapsto\) POINTst_convex_hull()
POLYGON \(\mapsto\) POLYGONMULTIPOINT \(\mapsto\) POLYGONst_intersection()
POINT, POINT) \(\mapsto\) POINT | POINT EMPTYPOLYGON, POLYGON) \(\mapsto\) POLYGON | LINESTRING | POINT | POLYGON EMPTYst_is_empty() and st_dimension() become your new best friends 😉st_is_empty(): Distinguishes between, e.g., POINT EMPTY and POINT(0 0)st_dimension(): NA for empty versions, otherwise
2 for surfaces (POLYGON, MULTIPOLYGON)1 for lines (LINESTRING, MULTILINESTRING)0 for points (POINT, MULTIPOINT)
Unary Operations

Binary Operations

Ternary Operations

Quaternary Operations


(i spent 4 yrs of undergrad studying abstract algebra and now it all sits gathering dust somewhere deep within my brain plz just let me have this moment i’ll never mention it again i promise)
Figure 4.2 in Lovelace, Nowosad, and Muenchow (2024)
Each cell here visualizes one component of the DE-9IM string 1020F1102, which describes the relationship between the following two geometries:
POLYGON(0 0, 1 0, 1 1, 0 1, 0 0)LINESTRING(0.5 -0.5, 0.5 0.5)library(sf)
polygon <- po <- st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))))
p0 <- st_polygon(list(rbind(c(-1,-1), c(2,-1), c(2,2), c(-1,2), c(-1,-1))))
line <- li <- st_linestring(rbind(c(.5, -.5), c(.5, 0.5)))
s <- st_sfc(po, li)
par(mfrow = c(3,3))
par(mar = c(1,1,1,1))
# "1020F1102"
# 1: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5,0), c(.5,.495)), col = 'red', lwd = 2)
points(0.5, 0.5, pch = 1)
# 2: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"B(line) = 0")))
points(0.5, 0.5, col = 'red', pch = 16)
# 3: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("I(pol)",intersect(),"E(line) = 2")))
plot(po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)
# 4: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"I(line) = 0")))
points(.5, 0, col = 'red', pch = 16)
# 5: F
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"B(line) = F")))
# 6: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("B(pol)",intersect(),"E(line) = 1")))
plot(po, border = 'red', col = NA, add = TRUE, lwd = 2)
# 7: 1
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"I(line) = 1")))
lines(rbind(c(.5, -.5), c(.5, 0)), col = 'red', lwd = 2)
# 8: 0
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"B(line) = 0")))
points(.5, -.5, col = 'red', pch = 16)
# 9: 2
plot(s, col = c(NA, 'darkgreen'), border = 'blue', main = expression(paste("E(pol)",intersect(),"E(line) = 2")))
plot(p0 / po, col = '#ff8888', add = TRUE)
plot(s, col = c(NA, 'darkgreen'), border = 'blue', add = TRUE)Code from Pebesma and Bivand (2023)
equals corresponds to the DE-9IM string "T*F**FFF*". If any two geometries obey this relationship, they are (topologically) equal!| 9IM | Interior | Boundary | Exterior |
|---|---|---|---|
| Interior | ![]() |
![]() |
![]() |
| Boundary | ![]() |
![]() |
![]() |
| Exterior | ![]() |
![]() |
![]() |
| 9IM | Interior | Boundary | Exterior |
|---|---|---|---|
| Interior | ![]() 2 |
![]() 1 |
![]() 2 |
| Boundary | ![]() 1 |
![]() 0 |
![]() 1 |
| Exterior | ![]() 2 |
![]() 1 |
![]() 2 |
| DE-9IM | Interior | Boundary | Exterior |
|---|---|---|---|
| Interior | 2 | 1 | 2 |
| Boundary | 1 | 0 | 1 |
| Exterior | 2 | 1 | 2 |
212101212

st_overlaps() |
Interior | Boundary | Exterior |
|---|---|---|---|
| Interior | T |
* |
T |
| Boundary | * |
* |
* |
| Exterior | T |
* |
* |
0, 1, 2):
T: “True” (non-empty, st_dimension() >= 0)F: “False” (empty, st_dimension() == NA)*: “Wildcard” (Don’t care what the value is)st_overlaps(): T*T***T**, st_equals(): T*F**FFF*| English | Mask | 212101212 |
Result |
|---|---|---|---|
| “Disjoint” | FF*FF**** |
FALSE |
x not disjoint from y |
| “Touches” | FT******* |
FALSE |
x doesn’t touch y |
| “Touches” | F***T**** |
FALSE |
x doesn’t touch y |
| “Crosses” | T*T***T** |
TRUE |
x crosses y |
| “Within” | TF*F***** |
FALSE |
x is not within y |
| “Overlaps” | T*T***T** |
TRUE |
x overlaps y |
st_relate(): The Ultimate Predicate| iso_3166_2 | name | de9im | touch | geometry |
|---|---|---|---|---|
| US-AL | Alabama | FF2FF1212 | FALSE | MULTIPOLYGON (((-87.41958 3… |
| US-AK | Alaska | FF2FF1212 | FALSE | MULTIPOLYGON (((-141.0056 6… |
| US-AZ | Arizona | FF2FF1212 | FALSE | MULTIPOLYGON (((-111.0063 3… |
| US-AR | Arkansas | FF2FF1212 | FALSE | MULTIPOLYGON (((-90.30422 3… |
| US-CA | California | FF2FF1212 | FALSE | MULTIPOLYGON (((-114.7243 3… |
| US-CO | Colorado | FF2FF1212 | FALSE | MULTIPOLYGON (((-109.0463 4… |
| US-CT | Connecticut | FF2FF1212 | FALSE | MULTIPOLYGON (((-73.6417 41… |
| US-DE | Delaware | FF2FF1212 | FALSE | MULTIPOLYGON (((-75.05809 3… |
| US-DC | District of Columbia | 2FFF1FFF2 | FALSE | MULTIPOLYGON (((-77.02293 3… |
| US-FL | Florida | FF2FF1212 | FALSE | MULTIPOLYGON (((-87.44734 3… |
| US-GA | Georgia | FF2FF1212 | FALSE | MULTIPOLYGON (((-80.89029 3… |
| US-HI | Hawaii | FF2FF1212 | FALSE | MULTIPOLYGON (((-154.8996 1… |
| US-ID | Idaho | FF2FF1212 | FALSE | MULTIPOLYGON (((-117.0382 4… |
| US-IL | Illinois | FF2FF1212 | FALSE | MULTIPOLYGON (((-89.1237 36… |
| US-IN | Indiana | FF2FF1212 | FALSE | MULTIPOLYGON (((-84.80608 4… |
| US-IA | Iowa | FF2FF1212 | FALSE | MULTIPOLYGON (((-96.48266 4… |
| US-KS | Kansas | FF2FF1212 | FALSE | MULTIPOLYGON (((-102.0396 3… |
| US-KY | Kentucky | FF2FF1212 | FALSE | MULTIPOLYGON (((-89.42446 3… |
| US-LA | Louisiana | FF2FF1212 | FALSE | MULTIPOLYGON (((-89.52599 3… |
| US-ME | Maine | FF2FF1212 | FALSE | MULTIPOLYGON (((-71.08495 4… |
| US-MD | Maryland | FF2F11212 | TRUE | MULTIPOLYGON (((-75.64786 3… |
| US-MA | Massachusetts | FF2FF1212 | FALSE | MULTIPOLYGON (((-71.19396 4… |
| US-MI | Michigan | FF2FF1212 | FALSE | MULTIPOLYGON (((-84.4913 46… |
| US-MN | Minnesota | FF2FF1212 | FALSE | MULTIPOLYGON (((-97.22609 4… |
| US-MS | Mississippi | FF2FF1212 | FALSE | MULTIPOLYGON (((-88.40221 3… |
| US-MO | Missouri | FF2FF1212 | FALSE | MULTIPOLYGON (((-95.31725 4… |
| US-MT | Montana | FF2FF1212 | FALSE | MULTIPOLYGON (((-116.0482 4… |
| US-NE | Nebraska | FF2FF1212 | FALSE | MULTIPOLYGON (((-104.0537 4… |
| US-NV | Nevada | FF2FF1212 | FALSE | MULTIPOLYGON (((-114.0425 4… |
| US-NH | New Hampshire | FF2FF1212 | FALSE | MULTIPOLYGON (((-71.50585 4… |
| US-NJ | New Jersey | FF2FF1212 | FALSE | MULTIPOLYGON (((-75.54133 3… |
| US-NM | New Mexico | FF2FF1212 | FALSE | MULTIPOLYGON (((-108.1375 3… |
| US-NY | New York | FF2FF1212 | FALSE | MULTIPOLYGON (((-79.06523 4… |
| US-NC | North Carolina | FF2FF1212 | FALSE | MULTIPOLYGON (((-76.03173 3… |
| US-ND | North Dakota | FF2FF1212 | FALSE | MULTIPOLYGON (((-104.0476 4… |
| US-OH | Ohio | FF2FF1212 | FALSE | MULTIPOLYGON (((-80.52023 4… |
| US-OK | Oklahoma | FF2FF1212 | FALSE | MULTIPOLYGON (((-103.0002 3… |
| US-OR | Oregon | FF2FF1212 | FALSE | MULTIPOLYGON (((-124.4924 4… |
| US-PA | Pennsylvania | FF2FF1212 | FALSE | MULTIPOLYGON (((-79.76301 4… |
| US-RI | Rhode Island | FF2FF1212 | FALSE | MULTIPOLYGON (((-71.23686 4… |
| US-SC | South Carolina | FF2FF1212 | FALSE | MULTIPOLYGON (((-78.57316 3… |
| US-SD | South Dakota | FF2FF1212 | FALSE | MULTIPOLYGON (((-104.0567 4… |
| US-TN | Tennessee | FF2FF1212 | FALSE | MULTIPOLYGON (((-90.30422 3… |
| US-TX | Texas | FF2FF1212 | FALSE | MULTIPOLYGON (((-103.3115 2… |
| US-UT | Utah | FF2FF1212 | FALSE | MULTIPOLYGON (((-111.0502 4… |
| US-VT | Vermont | FF2FF1212 | FALSE | MULTIPOLYGON (((-73.35134 4… |
| US-VA | Virginia | FF2F11212 | TRUE | MULTIPOLYGON (((-76.01325 3… |
| US-WA | Washington | FF2FF1212 | FALSE | MULTIPOLYGON (((-122.753 48… |
| US-WV | West Virginia | FF2FF1212 | FALSE | MULTIPOLYGON (((-82.58945 3… |
| US-WI | Wisconsin | FF2FF1212 | FALSE | MULTIPOLYGON (((-87.80425 4… |
| US-WY | Wyoming | FF2FF1212 | FALSE | MULTIPOLYGON (((-109.0463 4… |
library(mapview)
library(leaflet.extras2)
africa_sf <- ne_countries(continent = "Africa", scale = 50) |> select(iso_a3, geounit)
africa_map <- mapview(africa_sf, label="geounit", legend=FALSE)
N <- 10
africa_union_sf <- sf::st_union(africa_sf)
sampled_points_sf <- sf::st_sample(africa_union_sf, N) |> sf::st_sf() |> mutate(temp = runif(N, 0, 100))
sampled_points_map <- mapview(sampled_points_sf, label="Random Point", col.regions=cbPalette[1], legend=FALSE)
countries_points_sf <- africa_sf[sampled_points_sf,]
filtered_map <- mapview(countries_points_sf, label="geounit", legend=FALSE) + sampled_points_map
(africa_map + sampled_points_map) | filtered_mapThe issue: Data attributes of POINTs are not merged into data attributes of POLYGONs
POINT Attributes
st_join()| iso_a3 | geounit | temp | geometry | |
|---|---|---|---|---|
| 2 | ZMB | Zambia | 5.788125 | MULTIPOLYGON (((30.39609 -1… |
| 58 | SOM | Somalia | 95.301751 | MULTIPOLYGON (((41.53271 -1… |
| 59 | -99 | Somaliland | 41.214222 | MULTIPOLYGON (((48.93857 11… |
| 91 | NGA | Nigeria | 56.409432 | MULTIPOLYGON (((7.300781 4…. |
| 114 | MLI | Mali | 99.141948 | MULTIPOLYGON (((-11.3894 12… |
| 123 | LBY | Libya | 86.126311 | MULTIPOLYGON (((9.51875 30…. |
Assume the extensive attribute \(Y\) is uniformly distributed over a space \(S_i\) (e.g., for population counts we assume everyone is evenly-spaced across the region)
We first compute \(Y_{ij}\), derived from \(Y_i\) for a sub-area of \(S_i\), \(A_{ij} = S_i \cap T_j\):
\[ \hat{Y}_{ij}(A_{ij}) = \frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]
where \(|\cdot|\) denotes area.
Then we can compute \(Y_j(T_j)\) by summing all the elements over area \(T_j\):
\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|S_i|}Y_i(S_i) \]
\[ \hat{Y}_{ij} = Y_i(S_i) \]
\[ \hat{Y}_j(T_j) = \sum_{i=1}^{p}\frac{|A_{ij}|}{|T_j|}Y_j(S_i) \]
PPOL 6805 Week 4: Unary and Binary Operations